A Comparative Study of combined Feature Selection Methods for Arabic Text Classification
نویسندگان
چکیده
Text classification is a very important task due to the huge amount of electronic documents. One of the problems of text classification is the high dimensionality of feature space. Researchers proposed many algorithms to select related features from text. These algorithms have been studied extensively for English text, while studies for Arabic are still limited. This study introduces an investigation on the performance of five widely used feature selection methods namely Chi-square, Correlation, GSS Coefficient, Information Gain and Relief F. In addition, this study also introduces an approach of combination of feature selection methods based on the average weight of the features. The experiments are conducted using Naïve Bayes and Support Vector Machine classifiers to classify a published Arabic corpus. The results show that the best results were obtained when using Information Gain method. The results also show that the combination of multiple feature selection methods outperforms the best results obtain by the individual methods.
منابع مشابه
An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
متن کاملA Novel One Sided Feature Selection Method for Imbalanced Text Classification
The imbalance data can be seen in various areas such as text classification, credit card fraud detection, risk management, web page classification, image classification, medical diagnosis/monitoring, and biological data analysis. The classification algorithms have more tendencies to the large class and might even deal with the minority class data as the outlier data. The text data is one of t...
متن کاملSupport Vector Machines based Arabic Language Text Classification System: Feature Selection Comparative Study
Feature selection is essential for effective and accurate text classification systems. This paper investigates the effectiveness of six commonly used feature selection methods, Evaluation used an in-house collected Arabic text classification corpus, and classification is based on Support Vector Machine Classifier. The experimental results are presented in terms of precision, recall and Macroave...
متن کاملAn Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
متن کاملA New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier
With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JCS
دوره 10 شماره
صفحات -
تاریخ انتشار 2014